Search CORE

120 research outputs found

New Alignment Methods for Discriminative Book Summarization

Author: Bamman David
Smith Noah A.
Publication venue
Publication date: 06/05/2013
Field of study

We consider the unsupervised alignment of the full text of a book with a human-written summary. This presents challenges not seen in other text alignment problems, including a disparity in length and, consequent to this, a violation of the expectation that individual words and phrases should align, since large passages and chapters can be distilled into a single summary phrase. We present two new methods, based on hidden Markov models, specifically targeted to this problem, and demonstrate gains on an extractive book summarization task. While there is still much room for improvement, unsupervised alignment holds intrinsic value in offering insight into what features of a book are deemed worthy of summarization.Comment: This paper reflects work in progres

arXiv.org e-Print Archive

CiteSeerX

Social Meme-ing: Measuring Linguistic Variation in Memes

Author: Bamman David
Jurgens David
Zhou Naitian
Publication venue
Publication date: 15/11/2023
Field of study

Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language

arXiv.org e-Print Archive